Optimization terminated successfully.
Current function value: 0.574351
Iterations 5
EBA3500 Data analysis with programming
Jonas Moss
BI Norwegian Business School
Department of Data Science and Analytics
Note
Optimization terminated successfully.
Current function value: 0.574351
Iterations 5
To intepret the coefficients we must exponentiate them.
Hence increasing gpa with 1 increases the probability of being admitted by \(60\%\), everything else held equal. Increasing the rank of the university by one decreases the probability by \(30\%\) though.
\[f(\boldsymbol{x};\boldsymbol{\mu},\Sigma)=\frac{\exp\left(-\frac{1}{2}\left(\boldsymbol{x}-\boldsymbol{\mu}\right)^{T}\Sigma^{-1}\left(\boldsymbol{x}-\boldsymbol{\mu}\right)\right)}{\sqrt{(2\pi)^{k}|\Sigma|}}\]
\(\boldsymbol{x}\in\mathbb{R}^k\) is the vector of observations
\(\Sigma\) is the \(k\times k\) covariance matrix.
\(\boldsymbol{\mu}\in\mathbb{R}^k\) is the vector of means. * \(|A|\) denotes the determinant of \(A\).
Theorem 1 (Multivariate central limit theorem) Let \(\boldsymbol{X}\in\mathbb{R}^k\) be multivariate random variable with mean \(\mu\) and covariance matrix \(\Sigma\). Then \[\sqrt{n}(\overline{\boldsymbol{X}}-\boldsymbol{\mu}) \to N(0, \Sigma)\]
Theorem 2 (Multivariate central limit theorem) Let \(\hat{\theta}\) be a maximum likelihood estimator and \(\theta\) be the true parameter value. Then \(\hat{\theta}\) is asymptoticall normal with limiting variance equal to the inverse Fisher information, i.e., \[\sqrt{n}(\hat{\theta}-\theta) \to N(0, I(\theta)^{-1})\]
The estimated covariance matrix of the parameter estimates an be found by calling
| Diaphragm use | Urinary tract infection | |
|---|---|---|
| Yes | No | |
| Yes | 7 | 0 |
| No | 140 | 290 |
Model: \(P(\text{Urinary tract infection}) = F(\beta_0+\beta_11[\text{Diaphragm use}]),\) where \(F\) is equal to, e.g., the logistic CDF.
Example of perfect separation
Definition 1 A binary regression model is quasi-perfectly separated if \(\beta^T X_0 \geq 0\) and \(\beta^T X_1 \leq 0\).
Theorem 3 The maximum likelihood estimators of a binary regression model are finite if and only the regression model is not quasi-perfectly separated.
More detailed explanations can be found here.
| caffeine | n | a | prob | |
|---|---|---|---|---|
| 0 | 0.0 | 30.0 | 10.0 | 0.33 |
| 1 | 50.0 | 30.0 | 13.0 | 0.43 |
| 2 | 100.0 | 30.0 | 17.0 | 0.57 |
| 3 | 150.0 | 30.0 | 15.0 | 0.50 |
| 4 | 200.0 | 30.0 | 10.0 | 0.33 |
A researcher wishes to know if caffeine improves performance on a memory test. Volunteers consume different amounts of caffeine from 0 to 500 mg, and their score on the memory test is recorded. Source.
| caffeine | n | a | prob | |
|---|---|---|---|---|
| 0 | 0.0 | 30.0 | 10.0 | 0.33 |
| 1 | 50.0 | 30.0 | 13.0 | 0.43 |
| 2 | 100.0 | 30.0 | 17.0 | 0.57 |
| 3 | 150.0 | 30.0 | 15.0 | 0.50 |
| 4 | 200.0 | 30.0 | 10.0 | 0.33 |
n giving us the number of observations with the given covariate levels and a the numbe who got an A!import matplotlib.pyplot as plt
logistic = lambda x: np.log(x) - np.log(1-x)
plt.plot(data.caffeine, logistic(model.predict()))
plt.scatter(data.caffeine, logistic(data.prob))<matplotlib.collections.PathCollection at 0x203f5d10520>
<matplotlib.collections.PathCollection at 0x203f5f68ac0>
model_log_probit = smf.glm(
"a + I(n - a) ~ caffeine + np.log(caffeine + 1)",
family=sm.families.Binomial(sm.genmod.families.links.probit()),
data=data).fit()
model_log_cauchit = smf.glm(
"a + I(n - a) ~ caffeine + np.log(caffeine + 1)",
family=sm.families.Binomial(sm.genmod.families.links.cauchy()),
data=data).fit()
model_log_cloglog = smf.glm(
"a + I(n - a) ~ caffeine + np.log(caffeine + 1)",
family=sm.families.Binomial(sm.genmod.families.links.cloglog()),
data=data).fit()
{"logit" : model_log.aic, "probit" : model_log_probit.aic, "cauchit" : model_log_cauchit.aic, "cloglog": model_log_cloglog.aic}{'logit': 45.844885174715465,
'probit': 45.37382082095568,
'cauchit': 52.37445834568989,
'cloglog': 46.901711989146094}